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2  Report  Abstract 

Our  accomplishments  fall  into  several  categories.  In  [1]  and  [2] ,  we  consider  different  aspects 
of  the  problem  of  distributed  function  computation  across  a  network  and  propose  a  network 
flow  approach  to  characterize  trade-offs  between  computation  and  communication  costs. 
Moreover,  we  demonstrate  an  application  of  this  framework  in  cloud  computing.  In  [3], 
we  consider  a  network  coded  distributed  storage  problem  in  highly  dynamic  environments 
where  the  nodes  and  the  communication  links  availability  is  volatile.  We  propose  a  robust 
decentralized  network  coded  approach  that  requires  a  small  number  of  repair  nodes  for  node 
recovery.  In  [4],  we  propose  a  reliable,  multi-path  protocol  called  Multi-Path  TCP  with 
Network  Coding  (MPTCP/NC)  and  show  that  it  can  provide  users  in  mobile  environments 
a  higher  quality  of  service  by  enabling  the  use  of  multiple  network  technologies  and  the 
capability  to  overcome  packet  losses.  In  [5],  we  introduce  tunable  sparse  network  coding 
(TSNC),  a  scheme  in  which  the  density  of  network  coded  packets  varies  during  a  transmission 
session.  We  also  propose  a  family  of  tunable  sparse  codes  for  multicast  erasure  networks  with 
a  controllable  trade-off  between  completion  time  performance  and  decoding  complexity.  In 
[6]  and  [7],  we  investigate  different  aspects  of  coding  over  a  two  unicast  network  such  as  the 
tightness  of  the  generalized  network  sharing  bound.  We  also  develop  a  new  linear  network 
coding  algorithm  for  two-unicast-Z  networks  over  directed  acyclic  graphs.  In  [8],  we  analyze 
the  matched  filter  decoding  error  probability  in  random  binary  and  Gaussian  coding  setups 
and  show  that  the  performance  in  the  two  cases  is  surprisingly  similar  without  explicit 
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adaptation  of  the  codeword  construction  to  the  modulation.  In  [9],  we  study  an  adaptive 
sampling  scheme  where  sampling  times  do  not  need  to  be  stored/transmitted  since  they  can 
be  computed  using  a  function  of  previously  taken  samples.  Therefore  the  energy  consumption 
of  the  sampling  process  is  efficient  specially  when  the  signal  is  varying  slowly.  In  [10],  we 
present  a  general  method  for  inferring  direct  effects  from  an  observed  correlation  matrix 
containing  both  direct  and  indirect  effects,  while  in  [11],  we  introduce  a  method  to  solve  the 
inverse  problem  of  identifying  the  source  of  the  propagated  signal  for  large,  complex  real 
world  networks. 


3  Report  Abstract 

Our  accomplishments  fall  into  several  categories. 

Distributed  Functional  Computation  over  Networks.  In  [1],  we  consider  different 
aspects  of  the  problem  of  compressing  for  function  computation  across  a  network,  which 
we  call  network  functional  compression.  In  network  functional  compression,  computation  of 
a  function  (or,  some  functions)  of  sources  located  at  certain  nodes  in  a  network  is  desired 
at  receiver (s).  The  rate  region  of  this  problem  has  been  considered  in  the  literature  under 
certain  restrictive  assumptions,  particularly  in  terms  of  the  network  topology,  the  functions, 
and  the  characteristics  of  the  sources.  In  [1],  we  present  results  that  significantly  relax  these 
assumptions.  For  a  one-stage  tree  network,  we  characterize  a  rate  region  by  introducing  a 
necessary  and  sufficient  condition  for  any  achievable  coloring-based  coding  scheme  called  col¬ 
oring  connectivity  condition.  We  also  propose  a  modularized  coding  scheme  based  on  graph 
colorings  to  perform  arbitrarily  closely  to  rate  lower  bounds.  For  a  general  tree  network,  we 
provide  a  rate  lower  bound  based  on  graph  entropies  and  show  that,  this  bound  is  tight  in  the 
case  of  having  independent  sources.  In  particular,  we  show  that,  in  a  general  tree  network 
case  with  independent  sources,  to  achieve  the  rate  lower  bound,  intermediate  nodes  should 
perform  computations.  However,  for  a  family  of  functions  and  random  variables,  which  we 
call  chain  rule  proper  sets,  it  is  sufficient  to  have  no  computations  at  intermediate  nodes  to 
perform  arbitrarily  closely  to  the  rate  lower  bound.  In  addition,  we  consider  practical  issues 
of  coloring-based  coding  schemes  and  propose  an  efficient  algorithm  to  compute  a  minimum 
entropy  coloring  of  a  characteristic  graph  under  some  conditions  on  source  distributions 
and/or  the  desired  function.  Finally,  extensions  of  these  results  for  cases  of  having  feedback 
and  lossy  function  computations  are  discussed. 

In  [2] ,  by  using  network  flow  principles,  we  propose  algorithms  to  address  various  challenges 
in  cloud  computing.  One  of  the  main  challenges  is  to  consider  both  communication  and 
computation  constraints  in  the  network.  In  the  proposed  network  flow  framework,  we  model 
the  amount  of  computation  in  each  node  of  the  network  as  a  function  of  its  total  self-loop 
flows.  We  consider  two  computation  cost  models:  a  linear  computation  cost  model  and 
a  maximum  computation  cost  model.  We  show  that,  our  network  flow  framework  can  be 
used  as  a  systematic  technique  of  balancing  computation  loads  over  different  nodes  of  the 
network.  This  network  flow  framework  can  also  be  used  for  cloud  network  design.  A  network 
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topology  is  optimal  for  certain  computations  if  it  maximizes  the  total  computation  rate  under 
communication/computation  constraints.  We  propose  a  greedy  algorithm  to  design  a  cloud 
network  with  a  certain  network  characteristics  in  terms  of  communication  and  computation 
costs.  We  provide  simulation  results  to  illustrate  the  performance  of  our  algorithms. 

Network  Coded  Distributed  Storage:  In  distributed  cloud  storages  fault  tolerance  is 
achieved  by  regenerating  the  lost  data  from  the  surviving  clouds.  Recent  studies  suggest 
using  maximum  distance  separable  (MDS)  network  codes  in  cloud  storage  systems  to  allow 
efficient  and  reliable  recovery  after  node  faults.  MDS  codes  are  designed  to  use  a  substantial 
number  of  repair  nodes  and  rely  on  centralized  management  and  a  static  fully  connected 
network  between  the  nodes.  However,  in  highly  dynamic  environments,  like  edge  caching  in 
communication  networks  or  peer-to-peer  networks,  the  nodes  and  the  communication  links 
availability  is  very  volatile.  In  these  scenarios  MDS  codes  functionality  is  limited.  In  [3],  we 
study  a  non-MDS  network  coded  approach,  which  operates  in  a  decentralized  manner  and 
requires  a  small  number  of  repair  nodes  for  node  recovery.  We  investigate  long-term  behavior 
of  the  modeled  system  and  demonstrate,  analytically  and  numerically,  the  durability  gains 
over  uncoded  storage. 

Multi-Path  TCP  with  Network  Coding:  Existing  mobile  devices  have  the  capability 
to  use  multiple  network  technologies  simultaneously  to  help  increase  performance;  but  they 
rarely,  if  at  all,  effectively  use  these  technologies  in  parallel.  In  [4],  we  first  present  empirical 
data  to  help  understand  the  mobile  environment  when  three  heterogeneous  networks  are 
available  to  the  mobile  device  (i.e.,  a  WiFi  network,  WiMax  network,  and  an  Iridium  satellite 
network).  We  then  propose  a  reliable,  multi-path  protocol  called  Multi-Path  TCP  with 
Network  Coding  (MPTCP/NC)  that  utilizes  each  of  these  networks  in  parallel.  An  analytical 
model  is  developed  and  a  mean-held  approximation  is  derived  that  gives  an  estimate  of  the 
protocols  achievable  throughput.  Finally,  a  comparison  between  MPTCP  and  MPTCP/NC 
is  presented  using  both  the  empirical  data  and  mean-held  approximation.  Our  results  show 
that  network  coding  can  provide  users  in  mobile  environments  a  higher  quality  of  service 
by  enabling  the  use  of  multiple  network  technologies  and  the  capability  to  overcome  packet 
losses  due  to  lossy,  wireless  network  connections. 

Tunable  Sparse  Network  Coding:  In  [5],  we  show  the  potential  and  key  enabling  mech¬ 
anisms  for  tunable  sparse  network  coding,  a  scheme  in  which  the  density  of  network  coded 
packets  varies  during  a  transmission  session.  At  the  beginning  of  a  transmission  session, 
sparsely  coded  packets  are  transmitted,  which  benefits  decoding  complexity.  As  the  trans¬ 
mission  continues  and  the  receivers  have  accumulated  coded  packets,  the  coding  density 
is  increased.  We  propose  a  family  of  tunable  sparse  network  codes  (TSNCs)  for  multicast 
erasure  networks  with  a  controllable  trade-off  between  completion  time  performance  to  de¬ 
coding  complexity.  Coding  density  tuning  can  be  performed  by  designing  time  dependent 
coding  matrices.  In  multicast  networks,  this  tuning  can  be  performed  within  the  network 
by  designing  time-dependent  pre-coding  and  network  coding  matrices  with  mild  conditions 
on  the  network  structure  for  specific  densities.  We  present  a  mechanism  to  perform  efficient 
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Gaussian  elimination  over  sparse  matrices  going  beyond  belief  propagation  but  maintain¬ 
ing  low  decoding  complexity.  Supporting  implementation  results  are  provided  showing  the 
trade-off  between  decoding  complexity  and  completion  time. 

Coding  Algorithms  for  Two  Unicast  Network:  In  [6],  we  study  two-unicast- Z  networks- 
two-source  two  destination  (two-unicast)  wireline  networks  over  directed  acyclic  graphs, 
where  one  of  the  two  destinations  (say  the  second  destination)  is  apriori  aware  of  the  inter¬ 
fering  (first)  sources  message.  For  certain  classes  of  two-unicast-Z  networks,  we  show  that 
the  rate-tuple  (A,  1)  is  achievable  as  long  as  the  individual  source-destination  cuts  for  the 
two  source-destination  pairs  are  respectively  at  least  as  large  as  N  and  1,  and  the  generalized 
network  sharing  cut  -  a  bound  previously  defined  by  Kamath  et.  al.  -  is  at  least  as  large 
as  N  +  1.  We  show  this  through  a  novel  achievable  scheme  which  is  based  on  random  linear 
coding  at  all  the  edges  in  the  network,  except  at  the  GNS-cut  set  edges,  where  the  linear 
coding  coefficients  are  chosen  in  a  structured  manner  to  cancel  interference  at  the  receiver 
first  destination. 

In  [7],  we  derive  a  new  linear  network  coding  algorithm  for  two-unicast-Z  networks  over 
directed  acyclic  graphs,  that  is,  for  two-unicast  networks  where  one  destination  has  apri¬ 
ori  information  of  the  interfering  source  message.  Our  algorithm  discovers  linear  network 
codes  for  two-unicast-Z  networks  by  combining  ideas  of  random  linear  network  coding  and 
interference  neutralization.  We  show  that  our  algorithm  outputs  an  optimal  network  code 
for  networks  where  there  is  only  one  edge  emanating  from  each  of  the  two  sources.  The 
complexity  of  our  algorithm  is  polynomial  in  the  number  of  edges  of  the  graph. 

Matched  Filter  Decoding:  In  [8],  we  consider  the  additive  white  Gaussian  noise  channel 
with  an  average  input  power  constraint  in  the  power-limited  regime.  A  well-known  result  in 
information  theory  states  that  the  capacity  of  this  channel  can  be  achieved  by  random  Gaus¬ 
sian  coding  with  analog  quadrature  amplitude  modulation  (QAM).  In  practical  applications, 
however,  discrete  binary  channel  codes  with  digital  modulation  are  most  often  employed.  We 
analyze  the  matched  filter  decoding  error  probability  in  random  binary  and  Gaussian  coding 
setups  in  the  wide  bandwidth  regime,  and  show  that  the  performance  in  the  two  cases  is 
surprisingly  similar  without  explicit  adaptation  of  the  codeword  construction  to  the  modu¬ 
lation.  The  result  also  holds  for  the  multiple  access  and  the  broadcast  Gaussian  channels, 
when  signal-to-noise  ratio  is  low.  Moreover,  the  two  modulations  can  be  even  mixed  together 
in  a  single  codeword  resulting  in  a  hybrid  modulation  with  asymptotically  close  decoding 
behavior.  In  this  sense  the  matched  filter  decoder  demonstrates  the  performance  that  is 
largely  insensitive  to  the  choice  of  binary  versus  Gaussian  modulation. 

Time-Stampless  Adaptive  Nonuniform  Sampling:  Advances  in  sampling  and  cod¬ 
ing  theory  have  contributed  significantly  towards  lowering  power  consumption  of  resource- 
constrained  devices,  e.g.  battery-operated  sensor  nodes,  enabling  them  to  operate  for  ex¬ 
tended  periods  of  time.  In  [9],  rate  and  energy  efficiency  of  a  recently  proposed  adaptive 
nonuniform  sampling  framework  by  Feizi  et  ah,  called  Time-Stampless  Adaptive  Nonuni- 
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form  Sampling  (TANS),  is  examined  and  compared  against  state-of-the-art  methods.  TANS 
addresses  one  of  the  main  limitations  of  nonuniform  sampling  schemes:  sampling  times  do 
not  need  to  be  stored/transmitted  since  they  can  be  computed  using  a  function  of  previ¬ 
ously  taken  samples.  The  sampling  rate  is  adapted  continuously  with  the  aim  of  reducing 
the  rate  and  therefore  the  energy  consumption  of  the  sampling  process  when  the  signal  is 
varying  slowly.  Three  TANS  methods  are  proposed  for  different  signal  models  and  sampling 
requirements:  i)  TANS  by  polynomial  extrapolation,  which  only  assumes  the  third  deriva¬ 
tive  of  the  signal  is  bounded  but  requires  no  other  specific  knowledge  of  the  signal;  ii)  TANS 
by  incremental  variation,  where  the  sampling  time  intervals  are  chosen  from  a  lattice;  and 
iii)  TANS  constrained  to  a  finite  set  of  sampling  rates.  Practical  implementation  details 
of  TANS  are  discussed,  and  its  rate  and  energy  performance  are  compared  with  uniform 
sampling  followed  by  a  transformation-based  compression,  nonuniform  sampling,  and  com¬ 
pressed  sensing.  Our  results  demonstrate  that  TANS  provides  significant  improvements  in 
terms  of  both  the  rate-distortion  performance  and  the  energy  consumption  compared  against 
the  other  approaches. 

Inference  of  Direct  Relationships  over  Networks:  Recognizing  direct  relationships 
between  variables  connected  in  a  network  is  a  pervasive  problem  in  biological,  social  and 
information  sciences  as  correlation-based  networks  contain  numerous  indirect  relationships. 
In  [10],  we  present  a  general  method  for  inferring  direct  effects  from  an  observed  correlation 
matrix  containing  both  direct  and  indirect  effects.  We  formulate  the  problem  as  the  inverse 
of  network  convolution,  and  introduce  an  algorithm  that  removes  the  combined  effect  of  all 
indirect  paths  of  arbitrary  length  in  a  closed-form  solution  by  exploiting  eigen-decomposition 
and  infinite-series  sums.  We  demonstrate  the  effectiveness  of  our  approach  in  several  network 
applications:  distinguishing  direct  targets  in  gene  expression  regulatory  networks;  recogniz¬ 
ing  directly  interacting  amino-acid  residues  for  protein  structure  prediction  from  sequence 
alignments;  and  distinguishing  strong  collaborations  in  co-authorship  social  networks  using 
connectivity  information  alone.  In  addition  to  its  theoretical  impact  as  a  foundational  graph 
theoretic  tool,  our  results  suggest  network  deconvolution  is  widely  applicable  for  computing 
direct  dependencies  in  network  science  across  diverse  disciplines. 

Source  Inference  in  Networks:  Several  models  exist  for  diffusion  of  signals  across  biolog¬ 
ical,  social,  or  engineered  networks.  However,  the  inverse  problem  of  identifying  the  source  of 
such  propagated  information  appears  more  difficult  even  in  the  presence  of  multiple  network 
snapshots,  and  especially  for  the  single  snapshot  case,  given  the  many  alternative,  often  sim¬ 
ilar,  progression  of  diffusion  that  may  lead  to  the  same  observed  snapshots.  Mathematically, 
this  problem  can  be  undertaken  using  a  diffusion  kernel  that  represents  diffusion  processes 
in  a  given  network,  but  computing  this  kernel  is  computationally  challenging  in  general.  In 
[11],  we  propose  a  path-based  network  diffusion  kernel  which  considers  edge-disjoint  shortest 
paths  among  pairs  of  nodes  in  the  network  and  can  be  computed  efficiently  for  both  homo¬ 
geneous  and  heterogeneous  continuous-time  diffusion  models.  We  use  this  network  diffusion 
kernel  to  solve  the  inverse  diffusion  problem,  which  we  term  Network  Infusion  (NI),  using 
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both  likelihood  maximization  and  error  minimization.  The  minimum  error  NI  algorithm  is 
based  on  an  asymmetric  Hamming  premetric  function  and  can  balance  between  false  pos¬ 
itive  and  false  negative  error  types.  We  apply  this  framework  for  both  single-source  and 
multi-source  diffusion,  for  both  single-snapshot  and  multi-snapshot  observations,  and  using 
both  uninformative  and  informative  prior  probabilities  for  candidate  source  nodes.  We  also 
provide  proofs  that  under  a  standard  susceptible-infected  diffusion  model,  (1)  the  maximum- 
likelihood  NI  is  mean-field  optimal  for  tree  structures  or  sufficiently  sparse  Erdos-  Renyi 
graphs,  (2)  the  minimum-error  algorithm  is  mean-held  optimal  for  regular  tree  structures, 
and  (3)  for  sufficiently-distant  sources,  the  multi-source  solution  is  mean-held  optimal  in  the 
regular  tree  structure.  Moreover,  we  provide  techniques  to  learn  diffusion  model  parameters 
such  as  observation  times.  We  apply  NI  to  several  synthetic  networks  and  compare  its  per¬ 
formance  to  centrality-based  and  distance-based  methods  for  Erdos-Renyi  graphs,  power-law 
networks,  symmetric  and  asymmetric  grids.  Moreover,  we  use  NI  in  two  real-world  applica¬ 
tions.  First,  we  identify  the  news  sources  for  3,553  stories  in  the  Digg  social  news  network, 
and  validate  our  results  based  on  annotated  information,  that  was  not  provided  to  our  al¬ 
gorithm.  Second,  we  use  NI  to  identify  infusion  hubs  of  human  diseases,  defined  as  gene 
candidates  that  can  explain  the  connectivity  pattern  of  disease-related  genes  in  the  human 
regulatory  network.  NI  identifies  infusion  hubs  of  several  human  diseases  including  T1D, 
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